Data types overview

A brief history of data types

  • All1 computers store data in binary (1s and 0s) – example shown on the right, represented as hexadecimal
  • Variables add a level of convenience and abstraction by letting us name specific buckets to put data in, and data types give structure to these buckets.
  • In the early days of computing data was stored as raw binary
  • The need for specific data types came from the emergence of structured programming from the 1950s onward
  • Languages like FORTRAN and COBOL introduced the segregation of numeric datatypes and character types
  • Object-oriented languages like C++ and Java further expanded on this with user-defined data types
  • Specifying the type of data allows the machine to allocate an appropriate amount of memory to it (was very important in the early days of computing, but still relevant)
  • Allows us to prevent errors; setting the expectation on the exact type of data that a specific variable will contain.
Raw data in hex format (ASCII representation on right).
Core rope memory. The orientation of each metal ring encodes a 0 or a 1. (Konstantin Lanzet, Wikimedia Commons)

A quick note on type systems

Programming languages have different philosophies. They are often referred as being “strong” or “weak” and “static” or “dynamic”.

Types of type checking

Statically-typed languages (C++, Rust)

  • The programmer has to specify the data type for a variable or object in the code itself and they are checked at compile time
  • Benefits: safety (catches errors early), possibly more performant
  • Downsides: more tedious and less flexible

Strongly but dynamically-typed languages (e.g. Python)

  • Python features dynamic typing. There is no need to explicitly declare variables as being a specific data type, and it does allow limited implicit conversions, but not as extensively as e.g. JavaScript.

Weakly-typed languages (e.g. Javascript)

  • Allows extensive type coercion; mixing-and-matching of datatypes freely e.g. 5+"2"="52"
https://remotescout24.com/en/blog/806-typed-vs-untyped-programming-languages
C++. This code generates a type error; we tried to assign a string value to an int
JavaScript. This is valid JS code and ends with z being a string with the content “52”

Data types in Python

Overview

A logical overview of the basic data types in python. From https://pynative.com/python-data-types/

Booleans

Like most programming languages, python has a bool datatype. In some ways, this is the simplest type available. A Boolean can only be True or False, and is returned when evaluating an expression. For example:

our_result = 10>9 print(our_result)

Returns True - we’re asking Python for the result of the comparison 10>9, and to store this result in a variable called our_result. The data type of a true-false comparison result like that is bool, so our variable will also be of this type.

Booleans will become highly relevant when we talk about conditionals and program flow.

George Boole (1815-1864) - the originator of Boolean logic

Numeric types

Numeric types are for variables that will only contain numbers. Other programming languages often have many different numeric types, but Python (mercifully) only has two:

int can contain any2 whole (no fraction or decimals) number; negative, positive or zero. E.g.

  • a = -4
  • b = 3
  • c = 9087358292578

float can contain any number with a decimal point, to arbitrary3 precision. E,g, - x = -2.2 - y = 3.0 - z = 2452.259259999999999

If you’re manually assigning a number to a variable, python will always choose an int or float depending on whether you’ve used a decimal point or not - so 2 and 2.0 are not equivalent in this context.

Data structures

With data structures, we can address an element or elements by using square bracket notation - more on this below.

Strings (str)

An ordered sequence (string) of letters4. Enclosed by quotation marks. E.g.

  • our_string = "Hello world"

Lists (list)

An ordered sequence of objects, where each object can be another data type (int, float, string, bool, etc). Enclosed by square brackets, and the items separated by commas. E.g.

  • our_list = [1, 2.3, "abc"]

Dictionaries (dict)

Dictionaries are key-value pairs, where each entry is a pair of entries. Enclosed by curly braces, the keys and values separated by a colon and each pair separated by a comma. E.g.

  • our_dict = {"org_code":"0DF","name":"SCW CSU","year": 2013}

Other data types

Built-in

  • We’ve skipped over complex numbers and tuples, the latter being like a dict but non-changeable.

Other packages

  • You may have heard of other data types such as arrays (which are kind like lists but two-dimensional).
  • Arrays are not a built-in Python type but are offered by the numpy package.
  • pandas also offers additional data types such as timestamp (similar to SQL’s datetime).
  • dataframes (from pandas) are an example of a higher-order class that makes use of datatypes within it; remember from previous sessions that a dataframe can contain strings, integers, timestamps etc.

Thank you!

Final thoughts

Don’t worry about memorising any of this! If you take one thing away from this session, make it the fact that data types exist, that being aware of them will help you understand problems with your code, and that resources and documentation are readily available online.

Further reading

Footnotes

  1. experimental ternary computers and quantum computing are firmly out of scope of this presentation

  2. there is no clearly-defined maximum number for an integer in python; certainly not one you’re likely to ever encounter

  3. again, limits exist but aren’t relevant here

  4. or numbers, symbols, etc. - any valid UTF-8 symbols